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(57) Abstract 

The present invention concemis a method 
for the identification from DNA of a fragment 
comprising a simple tandem repeat locus com- 
prising the steps of: i) contacting a DMA li- 
brary, with at least one hybridisation probe so 
as to identify a population of DNA fragments 
enriched for simple tandem repeats; ii) isolating 
and cloning said population; and iii) screening 
of the resulting DNA library so as to identify an 
individual fragment comprising a simple tandem 
repeat locus. Also provided are simple tandem 
repeats isolated by the method of the present in- 
vention, characterised in that they may be am- 
plified at least in part by PCR using a speci- 
fied pair of primers, together with an^lification 
primers aiKl probes specific to the simple tan- 
dem repeats so isolated. The present invention 
also provides methods of genetic characterisation 
using the aforementioned simple tandem repeats, 
primers arid pfobesl 
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IDENTIFICATION OF SIMPLE TANDEM REPEATS 

The present invention concerns methods for the identification from DNA, in 
particular from genomic DNA, of a fragment comprising a Simple Tandem Repeat (STR) locus, 
together with simple tandem repeat loci, primer sequences and hybridisation pmbes, as well as 
methods of genetic characterisation using the aforementioned simple tandem repeats, primers 
and probes. 

Hybridisation techniques have been uised in the past as preparative steps in the 
selection from cDNA libraries of sequences hybridising with cloned genomic DNA ^(see 
Parimoo, S., e/ al, (1991)- Proc, Nat Acad, Set U,S,A. fi&: 9623-9627; and Lovett, M,, et al, 
(1991). Proc. Nat Acad Set U.SA. M: 9628-9632) and in the isolation of (AC)ri dinucleotide 
repeats from the mouse genome using immobilised short oligonucleotides as the hybridisation 
"target" (see Karagyasov, L., et aL (1993). Nucleic Acids Res. 21: 391 1-3912). However, these 
techniques have various disadvantages, a primary disadvantage being that the use of short 
oligonucleotide '^targets" results in hybridisation with a relatively restricted range sequences due 
to the inability of the ^^target" oligonucleotides to tolerate mismatches. 

This inability to tolerate mismatches in the screening of libraries using relatively 
short oligonucleotides composed of perfect repeats (see, for example, Li, S.-H., etaL (1993); 
Oenomics \6:, 572-579) is further exemplified by the fact that loci containing frequently 
interspersed repcjat unit variants may not be reliably detected. Interspersion of different repeat 
unit types is a common feature of many highly variable minisatellite loci, and has beeh exploited 
\ in the analysis of the mechanisms involved in the evolution of these longer repeat loci (see, for 
example. Armour; J.A.L., et al (1993). Human Mol Genet, !: 1 137- 1 145). 
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Hybridisation screening of imcnriched genomic libraries in the past has been 
successful in isolating simple tandem repeat loci from the human genome (see, for example, 
Weissenbach, J., et al (1992). Nature 25£: 79) but suffers from the general disadvantage of 
inefficiency; very large nimibers of clones need to be screened from small-insert libraries for 
each positively hybridising clone, and large-msert (cosmid) clones require subsequent 
subcloning or other manipulation (see, for example, Edwards, A., et aL (1991), Am. J. Human 
Genet. 42: 746-756; and I^gerstroni,M.,c/ a/. (1991)/PC/?A/erA^^^ 111-119) to determine 
sequence immediately adjacent to the repeat block. 

The present invention overcomes the aforesaid problems of the prior art and 
provides a simple and efScient method for the isolation of simple tandem repeat loci from DNA 
libraries and in particular from genomic DNA. This method is based upon prior enrichment for 
tandemly repeated DNA fragments, a prior erxrichment which is sensitive to the presence of 
tandemly repeated DN A, but which is tolerant to the positionirig of the tandem repeats and to 
mismatches. Only after enrichment by this method are the fragments cloned, resulting in a 
preselected library in which a significant proportion of clones comprise simple tandem repeats. 
This allows the rapid screening for and identification of usefully polymorphic loci by the simple 
examination and comparison of loci in different, possibly unrelated, individuals. Since the 
selected clones contain short inserts, the effort necessary to identify and sequence the region of 
the tandem array is also reduced. The use of long tandemly repeated hybridisation targets m the 
present invention for hybridisation screening for minisatellite clones allows the isolation of 
relatively long arrays and tolerates mismatched variant repeat arrays, allovwng the identification 
of a vdde range of minisatellite clones, something which has been hitherto impossible to 
achieve. 

In a first aspect of^the present invention there is provided a method for the 
identification from DNA of a fragment comprising a single tandem repeat locus comprising the 

^^^^^ suBsrrruTE sheet (rule 26) 
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i) contacting a DN A library with at least one hybridisation probe so as to identify 
a population of DNA fi^^nents enriched for simple tandem repeats; 

ii) isolating and cloning said population; and 

iii) screening of the resulting DNA library so as to identify an iridividiial 
fragment comprising a simple tandem repeat locus. 

The DNA library may be a genomic DNA libraiy ; the genomic DNA library may 
be any convenient population of DNA fragments such as hxmian DNA, DNA fromnori-huirian 
species or subgenomic DNA libraries such as those generated by PGR from flow sorted 
chromosomes (see Telenius, H., et aL (1992). Genomics 11: 718-725). The genomic DNA 
library may be obtained by restriction digestion of genonuc DNA. 

The average fragment size within the DNA library may be less than 1 ,5 kilobase^ 
£md may be less than about one kilobase. The fragment size may be from about 400bp to about 
lOOObp. 

The hybridisation probe or set of probes may be immobilised on a solid phase 
such as a nylon membrane and may identify a particular class of simple tandem repeats. Such 
classes may include dimeric, trimeric, tetrameric, pentameric and hexameric tandem repeats 
such as trimeric or tetrameric repe^its. Particular oligonucleotide probes for use in the present 
invention may include oligonucleotide probes comprising a tandemly repeated region of greater 
than 200bp. The probe may comprise repeats having at least 70%, such as 80% or 90%, 
similarity to a given repeat sequence. The hybridisation probe rnay be a set of probes comprising 
ihixed trimeric or tetrameric repeat DNA. 

The population of DNA fragments enriched for simple tandem repeats may be 
amplified prior to clomng and.this may be effected by PCR amplification. Universal linker 
sequences may be ligated to the ends of individual fragments, possibly prior to the enrichment 
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procedure, and linker sequence specific primers may then be used to amplify the enriched 
population. Linker sequences may then be removed, for example by restriction digestion, prior 
to cloning- 
According to the present invention there is also provided a method for the 
identification ftora genomic DNA of a fragment comprising a simple tandem repeat locus 

comprising the steps of: 

i) iigating universal linker sequences to the ends of fragments comprised in a 
genomic DNA library so as to form a library for PGR amplification; 

ii) contacting said PGR library with at least one hybricUsation probe so as to 
identify a population of library fragments enriched for simple tandem repeats; 

iii) separating and amplifying said population by PGR; and 

iv) cloning and screening the resulting amplification products so as to isolate an 
individual fi:agment comprising a simple tandem repeat locus* 

Gloning may be effected using any convenient cloning procedure and vector (for 
example pBluescriptll (Stratagene)) such as those described by Sambrook, J., Fritsch, E.F, and 
Maniatis, T. {19%9). Molecular Cloning, A Laboratory Manual. Cold Spring Harbor Laboratory 
Press. ■ ; 

Screening may be effected using any convenient hybridisation probe or set of 
probes comprising simple tandem repeat sequences; These may be the same as those disclosed 
above in respect of the enrichment procedure. Individual clones comprising simple tandem 
repeat loci may be analysed using conventional techniques to determine for example specific 
sequence information. 

By "simple tandem repeat locus" is meant a tandemly repeated region having a 
periodicity of up to eight bases, for example up to six bases, such as up to five, four or three 
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bases. Particular simple tandem repeat loci may have a periodicity of up to four or up to three 
bases. 

The method of the present invention has been used to identify a number of simple 
tandem repeat loci as disclosed in Table 2, together with corresponding flanking primer 
sequences disclosed in Table 3 and hybridisation probes which specifically identify such loci. 

Therefore, according to the present invention there are also provided simple 
tandem repeat loci for use in a method of treatment or diagnosis of the human or animal body 
characterised in that they may be amplified at least in part by PCR using any pair of primers as 
disclosed in Table 2. 

The simple tandem repeats may comprise at least the sequence of at leiast any one 
of sequences 1-47. Where a pair of sequences are indicated (see Table 4 and sequences 1-47), 
the first part of the sequence may be separated ftom the second part of the sequence by aii 
intervening sequence. This intervening sequence may comprise the repeat block of the simple 
tandem repeat. 

The simple tandem repeats may be polymorphic. Many of the STR loci so 
identified have been shown to have une?q>ectedly high polymorphism. Therefore, they may have 
a hetero2ygosity of at least 80%; they may have a heterozygosity of at least 85%; they may have 
a heterozygosity of at least 90%. 

The present invention also provides amplification primers specific to : the 
aforesaid simple tandem repeats for use in a method of treatment or diagnosis of the himiari or 
animal body; the method of amplification may be PCR. 

The present invention also provides probes specific to at least part of the 
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aforesaid simple tandem repeats for use in a method of treatment or diagnosis of the hxmaan or 
ammal body. 

According to further aspects of the present invention there are also provided 
methods of genetic characterisation v^rherein sample DNA is characterised by reference to at 
least one of the aforesaid loci, primer sequences and probes. The method of genetic 
characterisation may comprise either the use of at least one hybridisation probe or it may 
comprise the use of polymerase chain reaction (PGR) primers specific to at least one of the 
aforementioned loci in order to amplify selectively the simple tandem repeat locus. The PGR 
primers may comprise at least one of the primers and probes of the present invention. The 
method of genetic characterisation may be used in genetic mapping studies such as linkage 
studies, and may be used in the genetic analysis and diagnosis of inherited or acquired disease 
alleles. 

Such techniques of genetic characterisation may allow the generation of 
individual 'identities* specific for one or more polymorphic loci, possibly those of the present 
invention. The generation of such individual 'identities' may be used to identify and characterise 
family relationships and may be used for e.g. forensic testing and in any technique which uses 
simple tandem repeats and their polymorphisms, such techniques possibly identifying, for 
example, inherited diseases and their causes. 

Throughout the present application, the standard lUPAC nucleotide 
representation procedures are used. It should be noted that in these, R = A or G; Y = T or C; K 
= GorT; S = GorC; W = AorT;N = anybase. 

The invention vsdll be further apparent from, but not limited to, the following 
description and examples with reference to the several accompanying figures and tables. 
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6f the iHgures, 

Figiire 1 shows a schematic summary of enrichment by filter hybridisaidoil. 

, Figure 2 shows an enrichment for tandem repeats after iEilter hybridisation. Tliree 
replicate filters (A, B and C) bearing DNA from over 1000 clones from an enriched library were 
screened by hybridisation using a mixed triplet probe (left panel) or mixed tetramers (right 
panel). 

Figure 3 shows examples of genotyping at polymorphic tetranucleotide repeat 
arrays itsing ^^P end-labelled primers and denaturing polyacrylamide gels; ten imrelated 
individuals wctc typed at wglel2 (D7S822) and wglc4 (D8S580). Estimated fragment sizes (nt) 
are indicated. 

Figure 4 shows the number of alleles observed at the 46 tandem arrays identified 
in Table 1 investigated shown plotted against total array length. Triangular symbols denote 
triplet repeat arrays and squares denote tetrameric arrays; loci found to be inoiiomorphic (one 
allele only) have been included in the analysis. 

Of the Tables, ; 

Table 1 shows characterisation by sequence analysis of 54 positively-hybridising 
clones from the repeat-eiuiched library, resulting in the first 24 polymorphic loci of Table 2 

Table 2 shows properties of 24 polymorphic loci characterised in ari initial stucty, 
together with thirteen subsequently identified loci. The ntimber of alleles and hetero2ygosit>' 
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levels shown were those observed in the analysis of 20 or more imrelated individuals from the 
CEPH pedigrees. EMBL is the European Molecular Biology Laboratory data bank; GDB is the 
Genome Database 

Table 3 shows PGR primer sequences and annealing temperatures for the 
polymorphic loci described; primers shown marked with an asterisk were end-labelled for 
genotj^ping on denaturing polyacrylamide gels. 

Table 4 showis the simple tandem repeats and their sequence number or numbers. 
Where more than one number is given, the first part of the sequence may be separated from the 
second part of the sequence by an intervening sequence. 

Results 

During the characterisation of sequences isolated during an initial study (see 
Table 1 ), a total of 54 positively hybridising clones were analysed, giving 46 different sequences 
containing tandem repeat arrays (27 tetrameric; 19 triplet). These sequences were then used to 
design PGR primers (Table 1). Interestingly, two of these sequences showed near-perfect 
matches with sequences in the Genbank sequence database. 

Of these sequences containing tandem repeat arrays, further characterisation 
revealed that 24 of them showed length polymorphisms when tested in 4 imrelated individuals. 
These polymorphic loci are shown in Table 2 together with 13 subsequently identified loci. 

At these 37 new polymorphic loci, heterozygosity levels range froih 9% to 95% 
(Table 2). The simplest factor usefid in the prediction of variability ^TOs found to be the nature 
of the repeat block; tetramer repeat arrays not only showed more frequent polymorphism than 
triplets (18/27 v 6/19 in the initial study), but the average heterozygosity of those loci which 
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were polymorphic was also higher (75% v 34% in the initial study). The locus represented by 
clone wglc3 coiitains a modestly variable triplet repeat (GGC)n array. However, tibe 
non-repetitive region of the sequence determined shows 98% similarity over 148 bases With the 
published sequence of the human cDNA for translation initiation factor 4p (eIF4D) (see 
Smit-McBride,cra/.(1989). J. B/o/. CherrL 2^4: 1578-1583). The region of near-identity begins ; 
abruptly at position 330 of wglc3, corresponding to position 22 of the eIF4D cDNA, and 
suggests that the fragment isolated in clone wglc3 may span an intron-exon boundary from the . 
human eIF4D gene, and thus placing the (GGC)n repeat array within the preceding intron. 

Although identified as containing tandem repeats by hybridisation, 5 of the 54 
clones initially examined (about 10%) did not contain a recognisable tandem array. We assume 
that these clones might contain short imperfect arrays at some distance into the initial sequence 
analysis; where long (>8 repeat) arrays of near-perfect repeats were present they could easily be 
identified on sequencirig autoradiographs, even when some distance into the clone. 
Alternatively, they might represent sequences rescued because of cross-hybridisatibn to a 
(non-repeated) part of one of the sequences contributing to the membrane-bound ^'target" DNA. 

Hence the method of the present invention is a rapid and efficiCTit method for 
isolating simple sequence loci, in this case exemplified by tri- and tetrameric repeat loci froih 
the human genome. 

Methods 

The methods described below were used for the hybridisation selection of simple 
: . repeat loci, the characterisation of-isolated sequences and the characterisation of these novel 
polymorphic simple sequence loci. 
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Hybridisation Selection of Simple Repeat Loci 

Figiire 1 shows the general strategy for the hybridisation selection of simple 
repeat loci. More specifically, hiimaii Mbol fragments (400-lOOObp) were ligated to 
(SAULA/SAULB) linkers to give a 'Svhole-genbme" PGR library (see Kinzlw, KW. and 
Vogelstein, B. (1989). Nucleic Acids Res, 12: 3645-3653) from which tandem repeat-containing 
fragments were selected and reamplified. This population of molecules was denatured and 
incubated with two small nylon filters, one bearing mixed trimeric repeat DNA, the other 
bearing mixed tetirameric repeat arrays as described below. After hybridisation overnight at 
65*C, fragments hybridising to each filter were recovered and reamplified using the linker 
primer SAULA. The reamplified fractions were compared with the input DNA by Southem 
blotting and probing with the pooled triplet or tetramer sequences used for selection, and were 
shown to be highly enriched for the respective sequences. Dot-blot analysis of serial dilutions 
showed that the enrichment was at least 50-100 fold. 

DNA fragments from the reamplified, enriched DNA fractions were digested with 
Mbol (to remove linkers) and cloned into pBluescriptll vectors. Clones from the resulting 
librffly, containing both triplet and tetramer selected sequences, were picked into ordered array 
and screened by hybridisation to check the frequency of repeat-containing clones (see Figure 2 
for example). The probe mixtures originally used for filter hybridisation enrichment were also 
used as the hybridisation probes at this stage, and approximately 30% of clones initially studied 
gave positive hybridisation signals. 

Characterisation of isolated sequences 

The positively-hybridising clones were analysed for the initial characterisation 
of the enriched fragments. For each clone a first roimd of sequence analysis was performed 
using single stranded templates. In clones where the distal portion of unique sequence DNA 
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could not be determined directly, further sequence information was derived, either firoxn the 
other end of the insert (using double-stranded plasmid template) or by iising a specific priiher 
. proximal to the array to extend the region already sequenced Where a specific primer 
. for sequence extension, this prirner was subsequently used as an amplimer for PCFL 

Novel polymorphic simple sequence loci 

Specific products were amplified by PCR from all of the 46 loci isolated. At most 
loci, specific amplification products could be satisfactorily resolved on agarose (NuSieve 
(RTM)) gels. At loci derived from expanded polyadenylate tracts, or where finer resolxition was 
: required, an end-labelled primer was used, and fragments resolved on denaturing piolyacrylamide 
gels as described above; examples of tetranucleotide loci typed by this method are shown in 
Figure 3; The polymorphic loci identified are shown in Table 2. 

The loci have been initially mapped using a combination of somatic cell hybrid 
analysis and linkage in a small nimiber of CEPH pedigrees as described for minisatellite loci 
(see Armour, J.A.L., e/ aL (1990). Genomics 8: 501-512; and Armour, J j\.L., et aL (1992). 
Human Mot Genet. 1: 319-323). At many loci, the placement of recombinations could be^ 
inferred from linkage analysis, and thus a tentative interval containing the lociis could be defined 
; even from the analysis of only two or three pedigrees. These putative placements determined 
by linkage analysis (Table 2) were made using the NIH/CEPH maps of the htmiari chromosomes 
(see NIH/CEPH Collaborative Mapping Group (1992). Science 258: 148-162) as a framework. 
In all cases a LOD score of 3 or more wias required as evidence of linkage. 

General methods 

The foUowing geiiCTcd methods were iised in the hybridisation sete^ 
tandem repeat loci and the characterisation of isolated sequences. 
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General PCR conditions 

PGR was canied out usbg the buffer described by Jefifeys, A J., ^ 
Cell 60: 473-485 and 0.05U/fil Taq polymerase; unlabelled primers were added to a final 
concentration of IjiM. Cycling conditions were: for the whole genome PCR library, 95 *C 1 
min/67*C 1 nun/70"C 2 min; for amplifying simple repeat loci using locus-specific primers, 
95 ""C 1 min/ T'^C 1 min/ 70*C 1 min; the annealing temperature T used for each polymoq^hic 
locus is shown in Table 3. These aimealing temperatures gave good results in the PCR buffer 
used. In other buffer systems different temperatures may improve genotyping. 

For amplification to levels visible after ethiditim staining, 1 OOng genomic DN A 
was amplified for 32 cycles. Amplified fragments were resolved by electrophoresis in NuSieve 
(RTM) gels (FMC BioProducts, 2.5 - 4.5% according to firagment si2K) in 0.5x TBE buffer, and 
DNA detected by ethidium staining. Loci derived from expanded (Alii) polyadenylate tracts 
were most clearly detected after end-labelling of the non-Alu PCR primer and autoradiography 
of dried polyacrylamide gels; at other loci polymoiphism could be detected on ethidium stained 
agarose gels, but the use of polyacrylamide provided added resohition of closely-spaced alleles. 
One primer was end-labelled (1.5pmol primer per subsequent PCR reaction) using [y-^^P]ATP 
and T4- polynucleotide kinase. This labelled primer and lOpmol unlabelled primer were then 
used with 0.05U/^1 Taq polymerase in a IOhI PCR, Table 3 shows the primer used for 
end-labelling marked with an asterisk. In general, 18 cycles were sufficient to give clean typing 
fi-om lOOng genomic DNA; details of PCR conditions iand primer sequences for the fust 24 
polymorphic loci can be found in GDB. 

Whole genomic PCR library construction 

SAU linkers were made by annealing equimolar amounts of SAULA 
(5*GCGGTACCCGGGAAGCTTGG3') and 5' phosphorylated SAULB 
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(S'GAT CCCAAGCTTCCCGGGTACCGC3') as described by Royle, N J., et al. (1 992). Proc, 
; R, Soc. Lond B 247 : 57-61. Human genomic DNA pooled from 20 unrelated individuals was 

. digested vyith\A^oI and a 400-1 OOObp fraction size-selected after agarose gel electrophoresis.- 
266ng of DNA from this fraction were ligated with 2jag SAU linkers, a linkerfiagmeht molar 
ratio oif about 25 0; 1 . After a further roimd of size-selection to remove linker dimers, the library 
was amplified using S AULA primer to give products in the 400-1 OOObp range. 

Tandem repeat 'target" sequences 

Both naturally occurring and synthetic sequences were used as target sequences 
in hybridisation selection. Where cloned or amplified loci were used as a source of tandemly 
repeated DNA, care was taken to use fragments which did not contain human dispersed repeat 
elements. The triplet repeat sequences ACC and AGO were selected using a DNA fragment 
from a cloned human locus, pMS633, containing about 2kb of interspersed AGG/TGG(=ACC) 
repeats. Tandem arrays of the other triplet sequences used (AGC, ACG, ATG, AGT and CCG) 
were syhthesised as follows: 1 8mer oligonucleotides of each sequence and its complement were 
synthesised, phosphorylated, annealed and ligated into concatemers as described by Vergnaud, 
G. (1989). Nucleic Acids Res. H: 7623-7630. Fragments larger than 200bp were 
: size-firactibnated from the ligated DNA and subjected to cycles of PGR in the absence of primers 
to selectively lengthen tandem (rather than inverted) arrays (see CoUick, A. and Jeffreys, A J. 
(1990). Nucleic Acids Res. 18; 625-629), and fragments of apparent size greater than lOOObp 
recovered from a 1% agarose gel. The triplet sequences AAT, AAG and AAC were not used in 
order to avoid heavy bias towards triplet repeats arising from retroposon tails (see Beckman, J.S. ' 
: and Weber, J.L. (1992) Gcnom/c5 12: 627-63^ 

The self-complementary tetrameric repeat sequences C ATG and CTAG were 
synthesised as 16mer oligonucleotides and assembled into long arrays by ligation and 
" primer-free PGR. For other tetranleric sequences, cloned or amplified genomic fragments were 
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used. DNA fix)m the tetramer repeat locus composed of ATGG repeats near the htnnan myelin 
basic protein gene (sec Boylan, K.B., et aL (1990). Genomics jg: 16-22) was amplified from 
human genomic DNA using the primers MBPl (5'ACAAGGACCTCGTGAATTACAATC3') 
and MBP2 (5'ACAGGATTCACTCACATATTCCTG3'), to give fr^ments of about Ikb. DNA 
containing GrGCA repeats was amplified from the mouse minisatellite clone p9.2 (see Gibbs, 
M., et al. (1993). Genomics 12: 121-128). A subcloned firagment from cosmid G2 (see Armour, 
J. et aL (1992). Anru Hum, Genet. 56: 183) contains about 800bp of interspersed ACCC and 
ATCC repeats; the hxunan minisatellite clone pMS630 (see Armour, JA.L., et a/. (1992). 
Human Mol Genet, 1: 319-323) contains the octameric repeat (GGAGGGAA) and was thus 
used to select AGGG and AAGG tetramer repeats. 

Hybridisatio n Selection 

DNA fractions containing the different trimeric armys were pooled, denatured 
by treatment with alkali (KOH, final concentration 1 50mM), neutralised by adding 0.25 volumes 
of IM TRIS-HCl pH 4.8, and a total of 1 ng ^tted onto a small (3nmi x 3mm) piece of nylon 
filter (Hybond-N*P, Amersham). When dry, the filter was exposed to ultraviolet light to bind the 
DNA. A similar small filter was made using the pooled tetrameric repeat fragments. Since the 
filtCTS were to be used to select different types of sequence from the same input DNA, they could 
be used together in the same hybridisation, ITie filters were prehybridised in 1ml phosphate/SDS 
buffer (see Church, G.M. and Gilbert, W. (1984). Proc. Nat, Acad Sci U,S.A, 81: 1991-1995) 
at 65*^0 and transferred to lOOjal of the same buffer at 65*C in an Eppendorf tube. 

Input DNA was amplified from the whole genome PGR library and about 1 ]ig 
denatured with alkali, neutralised, and added to the buffer containing the filters; the reaction was 
covered with parafiBn oil and incubated overnight at 65^*0. The filters were washed thoroughly 
in 0.2x SSC, 0.01 % SDS at 65**C. After washing, the DNA bound to each filter was removed 
by treatment (at room temperature) with SO^il of 50mM KOH/O.Or/o SDS, followed by 50^1 of 
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50mM TRIS-HCl (pH 7.5) / 0,01% SDS; the washings from each filter were pooled, arid 
recovered DNA ethanol precipitated using primer SAULA (final concentratioii IjiM) as a 
"carrier. " . ' • . ■ 

Hybridisation-selected DNA was rearnplified, digested vyith Ai&ol an^ 
pBluescriptU vectors (Stratagcne). Clones were picked into ordered array for ease of screening 
and clones replicated from microtitre plates onto Nylon filters in groups of four as described by 
Browmtein, B.H., e/ a//(1989). Science 244: 1348-1351 

Sequence determination and analvsis 

DNA sequence was initially determined from clones hybridising positively \vith 
tandem repeat probeis using dideoxynucleotide chain termination with T7 DNA polymerase (seie 
Tabor, S. and Richardson, C.C. (1987). Proc. NaL Acad. Set U.S A. 84: 4767-4771) on single 
stranded templates. Where this proved insufficient to determine sequence on both sides of a 
tandem array, additional sequence was determined either from the other end of the clone vising 
double-stranded templates, or after extension of the sequence using a specific oligonucleotide 
primer, any specific primer used in sequence analysis was subsequently recruited as one of the 
amplimers in PGR. 

The sequence detenriined was analysed using the siiite of programs developed 
at the University of Wisconsin (Genetics Computer Group (1991). Program Manual for 
Package. Version 7. April 1 99 1 . 575 Science Drive, Madison, Wisconsin, USA 53711); updated 

: sequence databases were searched using the BLAST (see Altschixl. S J., et al (1990). J. Mdl 

: Biq/! 215: 403410) network sem^ 

It will be appreciated that it is not intended to limit the invention to the above 
example only, many variations, such as might readily occur to one skilled in the art, being 
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possible, without departing firom the scope thereof as defined by tiie appended claims. 
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nNA Seaue nde Information 

Sequence: 1, corresponds to wg0e7; Length 377 

5' gatcaaattt attctctcct ttgcacactg gaagtgcaag taacatttct 
: 51 tccttctcct gctcctcctc ctgataacaa tggtgatgat gatggtgatg 
101 atggtggtggtgatggtgatgttggtggtgatgatggtgatgg^atggt 
15 1 ggtgatgatg gtggtggtga tggtggtggt gatggtggtg gtgatggggt 
. 201 ggtgacggtg atgttgacgg tggtggtggt ggtggtgatg gagtggtgat 
. 251 ggagtggtgatgatggtggt gatggtggtg atggcgataa caaacatata 
301 ttaagacctt accatggctr ggcatggtgg ctgatrcctg taatcccagc 
351 actttgggaggccgaggcgggcagatc 

. $equence: 2, corresponds to first part of wgla2; Length 346 

5' gatcattcgg aagaaagtgt ggaagcagca gcaaagagtg gaaaatgaaa 
51 agagaaactc tggagaaggc aaggtgggca ggagcaggac tgtgccgcct 
101 gcacccatgc aggctaggcg ttgtccaaca ctggggcacc cgtcactcag 
151 attgagatga gggaciaatga gaggagcctg gaggagagct ccacacaaat 
20 1 aaagggagaa gcctatgcag gggctggaga ttccttctgt ggtgacagag 
; 251 catggcatag ttagattcac agactnnnnn nagatcgaga gaatgatgcg 
/ 301 tgctcttctc atctctcaag cagcaatgca gggggaacat cagctg 

Sequencers corresponds to second part of wgla2; Length 57 
5* ttgttttttt gatggagtct cactctgttg cccaggctgg agtgcagtgg 
51 cgtgatc 

' Sequence: 4 corresponds to first part of wgla3; Length 217 

5' gatccatcca tccttcctcc ctccttccct ctctttctac ctctttctcg , 
51 ctctctcttt cttccttcct tcctccctcc ctccctcctt ccctccctcc 
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101 cttcctcccg ccctccttcc cttttccctt cccccttcct ctttctttct 
1 5 1 ctttctttct ctttctttct tactttcttt ctttctttct ttctttcttt 
201 ctttctttct ttctttc 

Sequence: 5 corresponds to the second part of wgla3; Length 43 
5' gagtctcact ctgttgccca ggctggattg cagtggcagg ate 

Sequence: 6 corresponds to wgla9; Length 286 

5' gatcagtttc tgactgctgg gcgggacaaa gcctcctgaa gttgctgcga 
51 ggcacctccccctgtgagcagagcttggtacagcccaaatagttttcagg 
101 ttaagaaagc cagaatcttt gttcagccac actgactgaa cagactttta 
151 gtggggttac ctggctaaca gcagcagcgg caacggcagc agcagcagca 
201 gcagcagcag cagcagcagc agggctcctg ggataactca ggtgagtaga 
251 gagggaattc gcaaacttac cctggagttt tatttc 

Sequence: 7 corresponds to wglc3; Length 457 

5* gatccaatgg ctctttagtc agggtgttat gtcctgaaaa taggtgacaa 
51 ctgcaaacca tcctctggtg tccagagact ttaacaaggt ttgtttcaca 
101 gagactgagg gcagaaaaaa ggaaatggcc taaaaaggtg ggtttgctgt 
151 gttgcctcac actacttgat tcatggttct gattctaaaa atctcacttg 
201 atacttgatt tcatatgaaa gacgtgtaaa atgcctgggt agaggcggcg 
251 gcggcggcggcggcgggctcggaggcagcggttgggctcgcggcgagcgg 
301 acggggtcgagtcagtgccgtttgcgccggttggaatcgaagcctcttaa 
351 aatggcagat gatttggact tcgagacagg agatgcaggg gcctcagcca 
401 ccttcccaat gcagtgctca gcattacgta agaatggctt tgtggtgctt 
451 aaaggct 

Sequence: 8 corrresponds to wgl c4; Length 370 
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^ 5* gatcacacca ttgcactcca gcttgggcaa cagagtgaga ctcra 

: 51 acaaaaf^ana gaaagaaaag aaaagaaaga aaaganaaga aaagaaaaga 
101 aagagaaaga aagagaaaaa gaaagaagga aggaaggaag gaaggaagga 
151 aggaaggaajg gaagggagga aggaaggaag gaaagcaaga aagaaagaaa 
.: 201 gaaagaaaga aagagaaaga aagaaactat ccaaaccaat ctgatagagc 
251 tgaaaaactt actacaagaa tttcataata caatcagaag tattaacaac 
301 aaaatgcacc aagctaagga aagaatctca gaactag£iag acccagttct 
351. ttgaatctat tcagacagac 

Sequence: 9 corresponds to wglc5; Length 367 

5* gatctcaata aacattgata ctggagggat gaaatgaagg aaggatggat 

5 1 agaaggctat aaggatgggt ggatggataa atggatggat ggatggatgg 

101 atggatggat ggatggatag atggatggat ggaaaaatgg atagatggat 

151 gggtggatgg atgaatggat ggatggatgg atggatggat ggatgggtgg 
. 201 atggatgaatatattgggtggatggatgga aggaaggaag gaaggaagga 

251 aggaaggaag gaaggaagga aggatggtag aagaaaggta gtaccagtat 

301 gctttagctc atgcaggcaaacagatgatg ggcagagggaagcatggtgg 

351 ctgattacag gaggatc 

Sequence: ,10 corresponds to wgldl; Length 434 

5* gatcctcttg cctaggcctc ccaaagtgct gggattacag gcaagagcca 
51 ccacgtcccg cctctaattt ctctcctctt ctctcctctc ctttcctttc 

101 ctctcctctc ctttcctttc ctcttctctt cttgtttttt tcttttctnc ^ 
151 cctccctccc tccctccctc cctccctccc tccttccttc cttccttcct 
; 201 tccttccttc cttccttcct tccttccttt ttgagacaga gttttattct 

251 gtcacccaga cctgagtgca atgggcacaa ttttggctca ctgcaacctc 
v", 301 catciccccggttcaagtgattcttctgccttagcctcccgaatagctgg 
351 aactacaggc acctgccacc atgccccagc taattttttg tattctcagt 
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401 agagatgggg tttaccatgt tggccaggct gctc 

Sequence: 1 1 corresponds to wgld5; Length 325 

5' gatcgcgcca ctgtactcca gcctgggcga cagagcgaga ctctgtctaa 
5 1 aaaaaaagaa aaaagaaaaa aaagaaagaa tgagagaaag agagaaggaa 
101 ggaaggaagg aaaaggaaag agagagagga aggaaggaag gaaggaagga 
1 5 1 aggaaggaag gaaggaagga aggaagggag ggagggaggg agggagggag 
201 ggagggaggg agggaaaggc agggagaaag ttctgggagc tagggagtgc 
251 ccggggtggg gagctccaag aacaagcccc agggagctgt aacaaagact 
301 ttgtcacagc tagcctgaag ctagc 

Sequence: 1 2 corresponds to wgl d6; Length 263 

5' gatcccacct gccatacggt gggatttcta ggactataca aatgacagaa 
51 gggtagtaag aggaagactg tgttgcttaa tgaggtttcc agaaattggt 
101 aatgatattt gtaattccaa atcctactac aaggaactgt ggctacaata 
151 ttgatgctgctgctgctgctgctgctaatttgatgaagtaggctaatccg 
201 catggctaca tctctgtatt agtccattct cgcgctgcta taaagaaact 
25 1 acctgtgact ggg 

Sequence: 13 corresponds to wgldlO; Length 160 

5' gatcctgttc atggtacaaa gctttcccta gcagcctgcc ctccctagcc 
51 tgcttacctt gagnngagag gaagctgaag tagcagcagc agcagcagca 
101 gcagcagagt tnccagaaag tgaccccctc ccctgaacac agcaggaagc 
151 agcagtccaa 

Sequence: 14 corresponds to the first part of wgl dl 1 ; Length 238 
5' gatcatttca gtctgcacaa gaatgcttgg ccttttaatt ccaacttcac 
5 1 agttgagaaa actgatactc aaggcaaaga atcttctcag tagtcagagt 
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101 caataactgc aggaactaag actgg'aaccc aagttttctg cctggtatgt 
: ; 151 tgggcctaga agggaactgc tattcctatc tctccatctt tccttccatc 
201 tttccttcct tccttccttc cttaatcctt ccttcctt 

Sequence: 15 corresponds to second part of wgldl 1; Langth 120 
5' ttcctctctc actgtctccc tcnctctctg tctccctcct tcctttcttc 
51 accttctttc tactttttta agaaacaagg tctggctttg tcacccaggc 
101 tggagtgcag tggcgtgatc 

Sequence: 1 6 corresponds to wgl e 1 ; Length 445 

5' gatcttgaga cagggtcatc ctggattact ggagtgtgcc ctaaatccat . 
51 tgacaagtgt ccttaggaga gacgcagagt ggaggcacac agtgggagga 
1 0 1 cgaggccact tgaagactga ggccgggatt gcagcgatgc agccacaacc 
151 caggaaagtc cggggccacc agcggctgga aaaggcaagg gaggggtctt 
201 ctggctcttc aacaataaga gagtaaattt ctggtgtttt aagccacctg 
251 gtttgtggtg ctttttcctt ccctccttcc ttccttcctt ccttccttcc 
301 ttccttcctt ccttccttcc ttccttccct ccctccctcc ctccctctct 
351 ccctccctcc ctcccttcct tcctcttctt tttctctccc tctctccttt 
401 ttttcttttttttggtggag tcttgctgtgtcgcccaggctggag 

Sequence: 17 corresponds to wgle4; Length 591 

5' gatcccaaaa tactggcctc tcatagtgat agatttaaaa gattgcttct 
51 ttaccattcc tttagctacc caagattatg aaaaatttgc ttttactgtt 
101 ccttctataa ataacaaaga accagtggac agataccatt ggaaagtact 
1 51 gccacaaggc atgctiaaata gcccgactat ttgtcaaact tatatcggaa 
201 aagttatgaagccaattaaa gaacaattttacaaatgttatattatccat 
: 25 1 tacatggata attttatttg cagctgaaac taaagaggaa ttaatgctat 
301 gctacaaaca actggaaaag gcigigactg cagcgggatt aatcaatcat 
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351 agccctgata aaatccaaac ttctactccc tttcagtatt taggaatgaa 
401 agcagaataa agtactatca agcttcaiaaa ggttcaaatt agaagagatg 
451 atttaaaaac tctaaatggc cggcctgcct tccttccttc cctccctccc 
501 tccctccctc cctcccttcc ttccttcctc cctccctctc tctttcgacg 
55 1 gtctccctct gttgccgagg ctggactgta ctgccalgat c 

Sequence: 1 8 corresponds to wgl e7; Length 485 

5* gatcacttga ggccaggagt tcaagaccag cctgagcaac ctagtgaaac 
51 cccgttlcta caaaaaataa aaatttaaga aatagctgga tgcagaggca 
101 tctgcctgta gtcccagcta cccaggaggt tgaggaagga gaatcacttg 
151 agcccagaag cttgaggttg tagtaaggaa tgttcatgcc actgcactgc 
201 aacatgggtg acagtgcaag tttctgcctc aaaaggaagg aaggaaggaa 
251 ggaaggaagg aaggaaggaa ggaaggaagg aaggcaggca agaagaaaag 
301 aaggcaggga gagacggagg gaaagacaga aaagaaagaa aacctataaa 
351 aaagtataat cctgtgagtc cacagatgag acagagaaaa atctggaaag 
401 gattttaaaa taagtatgct taaattcttc aaagagacat agaaaggaat 
451 agaacccaca aaataagaat ggaaatattc gaaaa 

Sequence: 19 corresponds to wglel2; Length 597 

5' gatcttatga cattttccca ggacaccaag atataaaacc ccaaccaaca 
51 ttgctactgctaaagtaaac ttttgcctggcttgccaagatttttggcca 
101 agaaatgaga tttcctgagg gtggcattcc ctctgcacta ccaaagtctc 
151 cttctgagac tttttggtca gcttatgaag cttctcaagg caagtgtctg 
201 gttagcatct ccctccctcc actctggaaa tcttaaagct gaaagaatga 
25 1 atgaatgaat ggatgaatga atgaatgaat gagaagacag agagagagaa 
301 ggaaggaagg aaggaaggaa ggaaggaagg aaggaaggaa agaaagaaaa 
351 gaaaagaaag aagagagaaa gagagaaaga aagaaagaga gaaagagaga 
401 gagagagaga gagaaagaga gagaggaaga gaagaagtcc tcttaaaaaa 
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451 tagcctgaga aactgggcta tgttggcttt tttttttttc tgtcagtagg 
501 aaatatttat tcaacctcac tgctaaaaaa aaaccaaaac aaacaaacaa 
55 1 aaaaacctaa taatttcagg aaagctgctg tttctcgtgt tctgatc 

Sequence: 20 corresponds to wg 1 f2; Length 3 50 

5* gatcacgcca ctgcactcca gcctgggtga cagtgtgaga ccctgtcaag 
51 gaacgaacga aggaaggaag gaaggaagga aggaaggaag gaaggaagga 
lOi aggaaggaag gaaggaagga aggagggaag gaaggaaaga aggcaggcag 
151 gcaggcaggc aggaaggaag gaaggaaaga aggaaggaag gaaagaagga 
201 ajggaaggaaa gaaggaagga aggtaggaag gtaggaagga aggaaggaag 
251 gaaagaagga aggaaggaag gcagtcaggg agnaaggaag gaaggcaggc 
301 aggcaggcag gcaggcaggc aggcttgcaa atgtagttaa gttaaagatc 

Sequence: 2 1 corresponds to wgl f4; Length 283 

5' gatcatgcgg gcagctttgg ggtatttcag acggtgtggg gagcatggtc 
51 tgaatgtgcc ttgctccggc agcagcatgc agtagtggca gtggtactta 
101 gggcatgtga gagcaccctg cctctcctat ccctgaccca gcagcatgca 
151 gtagcggcag tggtacttag ggcatgtgag agcacccagc ctctcctatc 
201 cctgacccag cagctggcag cagcagcagc agcagcagca gcagcagccg . 
251 cctcagggca ggaggcagag ccttcaggcg tgg 

Sequence: 22 corresponds to wgl g5; Length 494 

5' gatcaictgca ctccagcctg ggtgacagaa taagacgaaa gagagaaaga 
51 gagagggaaa gaaagaaaga gagagagaga gagagagaga gagagagaaa 
101 gaaagaaaga aagaaagaaa gaaagaagaa agcaagaagg aaggaaggaa 
151 ggaaagaaag cagcagaaaa agaggaaggg agggaggaag gaaggaagga 
201 aggaagggag ggagggaagg aaggaaggaa ggaaggaagg aaggaaggaa 
251 ggaaggaagg aaagagagag agagaaagaa aatannnnnn nnnnnnaact 

SUBSnTliTE SHEET (RULE 26) 

.e517522A2_l_> 



wo 95/17522 PCT/GB94/02789 

*-24. 

301 ccnnnaaacc cacaattcag acacacagct cacacacagg tctccagcat 
351 agacatattt atacatccat ttactcaaac actcacaata caatcacata 
401 aaacaggcag acagttcaca tgccaacaca ctcttgcaca gacacgcaaa 
451 cagaagcatg gaatttgtac agagcacgct cacagtgtct gate 

Sequence: 23 corresponds to wglg9; Length 301 

5* gatcgtgcca ctgtacccca gcctgggcta cagagcgaga ctccatctca 
51 aaaaaaaaaa agaaagaaag aaagagagaa agagagagag agagatgaaa 
101 gaaagagaga gagggaaaga aaggaaggaa ggaaggaagg aaggaaggaa 
151 ggaaggaagg caggcaggca ggcaggcagg caggcaggca ggcaggcagg 
201 cggacagcaa gaagacaccg ttttgccatg aggttagaca cgcggacagg 
251 cacagagcag acgcacgtgc accatgctat catggcagga caggttcaca 
301 t 

Sequence: 24 corresponds to wglhlO; Length 538 

5' gatcatcaaa atacaattat agaaatattt ataagcagcattattcataa 
51 tcgccaaaaa ctggaagcag tgcgatggca aaatagatgc ataaatggtg 
101 ataagtataa gaggggaaga aagaatgaaa gaaagaatgg aaggaaggaa 
151 ggaaggaggg aaggaaggaa ggagggaagg aaggagggaa ggaaggaggg 
201 aaggagggaa ggaaggaggg aaggaaggaa gggaggaagg gagggaggga 
251 gggagggagg aagggaggga gggagggagg gagggaggga aaggactaga 
301 gggtggaaga tagggagaga aacaagtaaa taagctagct ctttcctaga 
351 aaataatttc accaacgttt ctgtgacatt caagaaaaca actgggactt 
401 ggaaacaatt aaaaataaat aaacaaaagt atgccactag actctaaagt 
451 cagtggtgtg ggaagcagag gttatcagtg ttcagaggag agaagactcc 
501 cacagaatag ggctgtcagg aatgagctca gggaggaa 

Sequence: 25 corresponds to wg2a5; Length 421 
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5' gatcacctga ggtcaggagt ttgagaccag cctggacaac atgacaaaac 
51 ccctatctaa aaaaaaagaa atagctaggc atggtggtgt gcacctgtag 
101 tcccagctac ttgggaggct gaggcagaga atcacttgaa cccgggaggc 
151 agaggttgca gtgagccgag gaggcgcxac ttcactccag cctctgtctc ; 
201 caaagaaaga aaggaaaaga aagaagggaa gaaagagaga gggaggaaag 
251 aagigggagga aggaaggaag gaaggaagga aggaaggaag ggagggaggg 
301 agggagggag ggagggaggg agggagggag ggagggaaaa agaaaagagg 
351 tgagcacacg gttacattga ggeiaaacaaa gatgaaactt cacatcacat 
401 tccaacaagt cacagcttga t 

Sequence: 26 corresponds to wg2b3; Length 446 

5' gatctcaggt gacccaccag cctcagcctc ccaaagtgct gggattacag 
5 1 gcctgtgcca ctgcacccag ccatctgttc agtactttca ttataagaga 
101 gaaaggagga gagggaaggg aagggaggga aggggagggg aagggaatca 
151 atgggaaagg agggtcaaga aggagaagga gagaaggaag gaagggaggg : 
201 aggaagagag gaaggaagga aggaaagaag gagggaagga aggaaggaaa 
251 gaaggaggga aggaaggaag gagggaggag ggaaggaagg aaggaaggaa 
301 agaaggaagg aaggaaggaa agagggaaga aaggaaggaa ggaaagaaag 
351 aagggaggaa gggagggagg gagggaggga atgagtggna gaagccaagt 
40 1 ctgcagttgg gaaatcatgg gacgtgctgg cttttcctct ctgatc 

Sequence: 27 corresponds to wg2c9; Length 287 

5* gatcacttga gcccaggagt tcaaggctgt ggtaaactat aatcacacta 
51 ctgcactcca gcctgggtga cagagaaaga ccctgtctca aaaaaggaag 
101 gaaagaagga aggaaggaag gaaggaagga gggagggagg gagggaggaa . 
151 gggagggagg gagggaagga gggaaggaag gaaggaagga aggaaggaag 
.201 gaaggaaata gcagctctga gcttagaaaa aggagtctat ttciaagtgg 
25 1 gagatgggga gaaggaggga actggggagg tgaggaa 
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Sequence: 28 corresponds to wgOcS; Length 252 

5' gatcattagg ttgaaaaaga gctaaaagat gaaaccgatt ggcactggtg 
51 tgtggtggtggtggaggaggtggtggtggcggcggcggtggtggtggtgg 
101 tggtggtggt ggtggcggtg gtggtggtag gaattactca agttactgga 
1 5 1 aacatgctgg tatctttttt tagtttaggt agtaaacctg gtaatgaaca 
201 ctaagtcaaa caacaaatac taatttccat ctcatgcaca aatgatatga 
251 aa 

Sequence: 29 corresponds to wg0f4; Length 329 

5' gatcagacac tctaaagtca cattccttta gaggaactgg acaatcaaat 
5 1 tttgatggtg ttctaatggt ttgtaaggca acaaaacaca aaactttgtg 
101 gtggtggtgg tggtggtggt ggtggtggtg gtggtatctt ccatcacttg 
151 ccaagggctt agcctggacc tgcacactca ctatctcctt gaccatttgc 
201 accatcacca ggagggaggc actaggtccc ccgttctcac tgttataaat 
251 aacaaacaggtctccaaggggtgagtaactttctcgtggacacacagagg 
301 caggtctagg atttgaaccc agtttgtct 

Sequence: 30 corresponds to wgOfS; Length 276 

5' gatctctcta ggtcctattc tctttcaacc ctctagggaa ctcaggaaac 
5 1 attgggctat tgtccataat gtggtgatgg tggtggtgat ggtggtggtg 
101 gtggcggtgatggcagcggcagtggtgatggcgatggcggcagcggcggt 
151 ggtggtggtg gtgtcacccg aggctgcctt ggtccagcca gcacgcagcc 
201 ttclctattc attctctctt gtgtggaccc gtgggggaat tctatgagtc 
251 ttgccacttc anggctccac tcagaa 

Sequence: 31 corresponds to the first part of wg2e7; Length 185 

5' GATCATAGAG CAGGTCACCA GGATGAAGAC TGCATGAAGG GAAGGGCTTT 
51 GATGTACTCA TTGTCCTGGC CCCGGCATGG AGGTGCTGGA AGGCAAGAGG 
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101 GAGGAGGAGG GAGGCAGAGA TGGAAGGATG AAGGAGAAGA AGGAAGGAAG 
151 (GAAGGAAGGG AAGGAGGGAG GGACAGAGGG AGGAT 

Sequence: 32 corresponds to the second part of wg2e7; Length 22 (bases 193-214) 
5' GGAAAGTTTT TTTAAAAAGA TC 

Sequence: 33 corresponds to the first part of wg2f7; Length 140 

5' GATCTACATG CATAGTTTAT tTTTTATGTT CTTTTATGTT TGTTAATATG 
51 TAAATATATT TGTGATATAT TATTAAGTNA GAATATCAAC NGCCTTCCTT 
101 CTTTCCNNCC CrCCCCACTT CCCTNCCTTN CCTTCCCAGC 

Sequence: 34 conresponds to the second part of wg2f7; Length 21 1 (bases 144-354) 
5' TCTGACAAGG TCTGTCTCTG TCACCTAGGC TAGAATGCAG TGGTGNAATC 
51 AATAGCTCAC TGCAGCCTTG ACCTTATGGA CTCAAGTAAT GCTCCTACCT : 
101 CAGNNTCCNN ACAGNNGGGA CCTCAGGTGC ATACCACGCT CTGCTAATTT 
151 ATAGAGATGG AGTCTTACCA TTTTGCCTAA GATGGTCTCC AACTCCCGGG 
201 TTCAAGTGAT C 

Sequence: 35 corresponds to wg2fl0; Length 374 

5" GATCTTGGCT GGGTCAACAC TCCTTCCTGG GCTTCAGnT CTCATCTAAG 
51 AAGAGAGAGT TGGAGGATTG TGGTGGGGGG TTGGTCAGTG AAGGTAGGCA 
101 TCCCAGGGTG GGTANCCATG AGGGTCTCTC TAGTCCTTTT TTCTTCTTCA 
151 CCCTTACACt TATCCACCCA TCCAACCATC CATCCATCCA tCCATCCATC 
201 CATCCATCCA TCCA l 111 i I Cll 1 11 ICl 1 I Tl 1 I CTTTT TltGAGATGG ■; 
,251 AGtCTTGCTC TGTTGCCCAG GCTGGAGTGC AGTGGCATGA TGTCAGCTCA. 
301 CtGCAACCTC TGCCTCCTGA-GTTAGAGTGA TTTTCCTGCC TGAGCCTCCT r 
351 GAGTAGCTGG GACTATAGGC ACAC 
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Sequence; 36 corresponds to the first part of wg2g4; Length 106 

5* GATCACCTGA GGGAGCTCAA GACCAGCCTG GCCAACATGA TGAAACCCCG 
51 TATATACTAA AAAGTACAAA AAATCANNNG GGTGTGTGGT GGGANTGTAA 
101 TNTTAG 

Sequence: 3 7 corresponds to the second part of wg2g4; Length 1 24 (bases 1 1 4-237) 

5' GAAAGAAAGA AAGAAAGAAA GAAAGCAAGC AAGCAAGCAA GCAAGCAAGC 

51 AAGCAAGCAG GCAGGCAAGN NAGCGGCGTC ACGCCNGTAA tcccagcagt 

101 ttgggaggcc gaggcgggca gatc 

Sequence: 38 corresponds to the first part of wg2gl 2; Length 213 

5' gatcatttcc cagtacataa ggacctgttt ctctcctgct aacattaacc 
51 ctacttgaga cttagagaaa gaggcatcac acttgaaagt ctcctgtggg 
101 tataatgtct actctttgtt tcatgaaagg atatcgtggg gtggtagctt 
151 tttggttttc tttctctctt tctctctttc tttctttctt tctttctttt 

201 CTTTCTTTCT TTC 

Sequence: 39 corresponds to the second part of wg2gl2; Length 67 (bases 224-288) 

5' TTCCTTCCNT CTTTTTTGTG GATGGAGTTC TGCTCTGTCA CCCTGGCTGG 

51 AGCGCAGTGG CACGATC 

Sequence: 40 corresponds to the first part of wg2hl 1 ; Length 97 

5" GATCGCACAC TGCACTCCAG CCTGGCAACA GAGGGAGACT TCATCAGAGA 
51 CAGAGAGAGA CAGAAAGAGA GAGAGATAGA GAAAGGGAGG GAGGGAG 

Sequence: 4 1 corresponds to the second part of wg2hl 1 ; Length 95 (bases 1 05-199) 

5* AAGGAAGGAA AGAAGGAAGG AAGGAAGGAA GGAAAAAAQA AAAGAGAAAA 

51 AAAAAGGAGA GAGGTTGAAA AAAACAACTA CCTTGTGGTC AGATC ; 
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.. Sequence:. 42 coifesponds to wg3a6; Length 278 
■ 5' GATCACtTAG CCTGGGAGGT TGAGGCTGCA GTGAGTCATG ATTTTGCCAC 
51 TACTGCATtC CAGCCTGAGT GACAGAGCCA ACCTGTCTTG AAAGAAAGAA 
■ : 101 AGAAAAGAAA GAAGGAAAGA AAGAAAGAAA GAAAGAAAGA GAGAAAGAAA 
151 GAAAGAAGGA AAGAAAGGAA GGGAAAGAAA GAAAGGAG(3G AGGGAAGGAG 
201 GGAGGGAGGG AAGGAGGGAG GGAGGGAGGG AGTATAAGAT GTATdCCCTF 
251 AGCAAATGTT TAAATACACA GTATAGTT 

. 'Sequence: 43 con-esponds to the first part of wg3bl(^ Length 204 

:5' GAtCAAAACT GAGAAGCGCA AAGACAAAGA GTGTGCTTGT TGAATACCAA 
51 GTTGTATAGG CTGCAGAAGA GGAAGTGGTG GGACTGGAGT ctagagagtc 
101 TTGAACACCA GGTTTGGGAG tCTGGAGTTC ACTTGGTGAG TAACAATCTC 
■ 151 TGGCAGAGGA AGACTCCGTC TGAAAGAAAG AAAGAAAGAG AGAGAGAGAG 
201 AGAG 

Sequence: 44 corresponds to the second part of wg3bl0; Length 85 (bases 212-296) . 

5" AAAGAAAGAA AGAAAGAAAG AAAGAAAGAA AGAAAGAGAG GAAAGAAAGA 

51 AAGAAAAGAA AAAAAGGAAA GGAATGAAAG GGATC 

. Sequence: 45 corresponds to the first part of wg3fl2; Length 140 

5' GATCATGCTA CTGCACTCCT GCCTGGACGA CAGATTGAGA CCCCATCTCG , 
51 GAAGGAApGA AGGAAGGAAG GGAGGGAGGA AGGAAGGAAG GAAGGGAGGG 
'■: 101 AGGGAGGGAG GAAGGGAGGG AGGGAGGGAG GGAGGAAAAC 

- .Sequence: 46 corresponds to the second part of wg3fl2; Length 153 (bases 150-302) 

5* ATAGAAAGTA AGAAAGAAAG GAAACAATTG TGTGATGCAC AGCTTtiGTGC 
51 ACiTGAGGNTT TTTTTGCCTC CAAGGTTTTG GGACAAGAAG GCACACAGAG : 
101 AATTAAAGGA GTCCAGAGTT ACTTGCTGTC CTGAtATAGA TCCACTAGTT 
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150 CTA 

Sequence: 47 corresponds to wg3h2; Lengfli 278 

5* GATCTTTTTG GCTTTTTGGC ATAACATGGC TGGCAGAGCT CAAATTGTTT 
51 TTATCAGCTT AGTtACCTCT ACCCAGTAGA AATACAACTG CTGAAATTGT 
101 AATTAGGTCT TTTATATTCC TCTCCTTCCT CCCTCCCTCC CTCCCTCCCT 

151 CCGTCCCTCC CTTCCTCCCT TCCTTCCTTC CTTTCTTCCC TACCCCCCTC 
201 TCTTTCTTCT TTTTATTTCC TTGTTTATTT CTGTCTAGCA CTAGATTTCA 
251 TGGGAGACAT AGACTAAGAT ATAAATTT 
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CLAIMS 

1 . A method for the identification from DNA of a fragment comprising a 
simple tandem repeat locus comprising the steps of: 

i) contacting a DNA library with at least one hybridisation probe so as to 
identify a population of DNA fragments enriched for simple tandem repeats; 

ii) isolating and cloning said population; and 

iii) screening of the resulting DNA library so as to identify an individual 
fragment comprising a simple tandem repeat locus. 

2. A method according to claim 1 wherein the DNA library comprises a 
genomic DNA library. 

3. A method according to either one of claims 1 and 2 wherein the DNA 
library comprises genomic human DNA fragments. 

4. A method according to any one of claim 1 to 3 wherein the DNA library 
comprises subgenomic DNA fragments. 

5. A method according to any one of the preceding claims wherein the 
average fragment size within the DNA library is less than about 1 .5 kilobases. 

6. A method according to any one of the preceding claims wherein the 
average fragment size within the DNA library is less than about 1 kilobase. 

7. A method according to any one of the preceding claims wherein the 
average fragment size within the DNA library is from about 400bp to about 1 OOObp. 
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8. A method according to any one of the preceding claims wherein th^^ 
hybridisation probe or probes is immobilised on a solid phase. . 

9. A method according to claim 1 wherein the solid phase comprises a 
nylon membrane. 

10. A method according to any one of the preceding claims wherein the 
hybridisation probe or probes identifies a particular class of simple tandem repeats. 

11. A method according to claim 9 wherein the class of simple tandem 
repeats is selected fi*om the group of dimeric, trimeric, tetrameric, pentameric and hexameric 
tandem repeats. 

12. A method according to any one of the preceding claims wherein the 
hybridisation probe or at least one of the hybridisation probes comprise a taridemly repeated 
region of greater than 200bp. 

13. A method according to claim 12 wherein the probe or probes comprise 
repeats having at least 70% similarity to a given repeat sequence. 

14. . A method according to either one of claims 12 and 13 wherein the 
probe or probes comprise repeats having at least 80% similarity to a given repeat sequence. : 

15. A method according to any one of claims 12 to 14 wherein the probe or 
probes comprise repeats having at least 90% similarity to a given repeat sequence. 

16. A method according to any one of the preceding claims wherein the 
' hybridisation probes comprise a set of mixed trimeric or tetrameric repeat DN A. 
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17^ A method according to any one of the preceding claims wherein the 

fra^ent or fragments coinprising a simple tandem repeat locus so identified is subsequently 
amplified prior to cloning. 

13^ A method according to claim 1 7 wherein the amplification is effected by 

PGR. 

19^ A method according to claim 18 \^crein universal linker sequences are 

ligated to the end or ends of the ftagment or individual firagments. 

20. A method according to either one of claims 1 8 and 1 9 wherein universal 
linker sequences are ligated to the end or ends of the fragment or individual fragments prior 
to the identification of the fi^gment or fragments. 

21. A method according to either one of claims 19 and 20 wherein linker 
sequence specific primers are used to amplify the enriched population. 

22. A method according to claim 21 wherein the linker sequences are 
removed subsequent to amplification and prior to cloning- 

23. A method according to any one of die preceding claims for the 
identification from genomic DN A of a fragment comprising a simple tandem repeat locus 

comprising the steps of: 

i) ligating universal linker sequences to the ends of fia^ents comprised 
in a genomic DNA library so as to form a library for PGR amplification; 

ii) contacting-said PGR library with at least one hybridisation probe so 
as to identify a population of library fragments enriched for simple tandem repeats; 

iii) separating and amplifying said population by PGR; and 
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. iv) cloning and screening the resulting amplification products so as to : 
isolate an individual fragment comprising a simple tandem repeat locus. 

24. A method according to any one of the preceding claims wherein 
screening is effected using at least one hybridisation probe comprising at least part of a 
siinple tandem repeat 

25. A simple tandem repeat for use in a method of treatment or diagnosis of 
the human or animal body characterised in that it may be amplified at least in part by PCR 
using any one of pairs 1 to 37 of primers. 

26. A simple tandem repeat according to claim 25 wherein it comprises at 
least the sequence of at least one of sequences 1 to 47. 

27. A simple tandem repeat according to either one of claims 25 and 26 
^^llerein it is polymorphic. 

28. • A simple tandem repeat according to any one of claims 25 to 27 wherein 
it has a heterozygosity of at least 80%. 

29. : A simple tandem repeat accordmg to any one ofclainis 25 to 28 wheriein 
it has a heterozygosity of at least 85%. * 

30. A simple tandem repeat according to any one of claims 25 to 29 wherein 
it has a heterozygosity of at least 90%. 

. 31. A pair of amplification primers for use in a method of treatment or 

diagnosis of the human or animal body specific to any one of the simple tandem repeats of 
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any one of claims 25 to 30. 

32. A padr of simplification primers according to claim 3 1 wherein they are 
PCR primers. 

33. A probe for use in a method of treatment or diagnosis of the human or 
animal body specific to at least part of any one of the simple tandem repeats of any one of 
claims 25 to 30. 

34. A method of genetic characterisation of the human or animal body 
wherein sample DNA is characterised by reference to at least one of the simple tandem 
repeats, primers and probes of any one of claims 25 to 33. 

35. A method of genetic characterisation according to claim 34 wherein it 
comprises either the use of at least one pair of amplification primers or probe of any one of 
claims 31 to 33. 

36. Amethodof genetic characterisation according to either one of claims 
34 and 35 wherein it is a genetic mapping study. 
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Figure 3 
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Figure 4 
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